Research on social and economic factors influencing regional mortality patterns in China

Regional population mortality correlates with regional socioeconomic development. This study aimed to identify the key socioeconomic factors influencing mortality patterns in Chinese provinces. Using data from the Seventh Population Census, we analyzed mortality patterns by gender and urban‒rural division in 31 provinces. Using a functional regression model, we assessed the influence of fourteen indicators on mortality patterns. Main findings: (1) China shows notable gender and urban‒rural mortality variations across age groups. Males generally have higher mortality than females, and rural areas experience elevated mortality rates compared to urban areas. Mortality in individuals younger than 40 years is influenced mainly by urban‒rural factors, with gender becoming more noticeable in the 40–84 age group. (2) The substantial marginal impact of socioeconomic factors on mortality patterns generally becomes evident after the age of 45, with less pronounced differences in their impact on early-life mortality patterns. (3) Various factors have age-specific impacts on mortality. Education has a negative effect on mortality in individuals aged 0–29, extending to those aged 30–59 and diminishing in older age groups. Urbanization positively influences the probability of death in individuals aged 45–54 years, while the impact of traffic accidents increases with age. Among elderly people, the effect of socioeconomic variables is smaller, highlighting the intricate and heterogeneous nature of these influences and acknowledging certain limitations.


Factors influencing mortality patterns research
Mackenbach et al. conducted a regression-based analysis comparing inequality indices in mortality and self-rated health across 22 European countries, revealing higher mortality rates and poorer self-rated health in socioeconomically disadvantaged groups 10 .Kibele et al. found that factors contributing to spatial variations in mortality in Germany changed significantly over time, with a strengthened connection between regional socioeconomic conditions and mortality 2 .Mackenbach's review of theories explaining social inequality suggested that health inequality plays a role in exacerbating social inequality 11 .Lutz and Kebede's study across 174 countries from 1970 to 2010 found education to be a better predictor, linking improved education to increased income and better health outcomes 12 .Gutin and Hummer's review highlighted the association between social status and health, where high socioeconomic status individuals may extend their lives beyond biological expectations, while disadvantages due to social status could lead to premature mortality for lower-status individuals 13 .
Several studies have investigated the intricacies of factors affecting mortality rates in China.Zhao's macrolevel exploration identified socioeconomic factors impacting post-1949 mortality rates 14 .Nie & Song empirically analyzed the factors that influenced infant mortality rates between 1996 and 2002 0 .Luo & Xie uncovered contextual determinants for the influence of socioeconomic indicators on elderly mortality, highlighting the nuanced sociopolitical landscape of China 15 .Li's work revealed the significant impact of healthcare resource investment on regional disparities in elderly health 16 .Chen et al. conducted a comprehensive investigation into various factors shaping population mortality, spanning from wastewater pollution to healthcare improvements 17 .Fan's latent socioeconomic status variable stressed the pivotal role of higher socioeconomic status in mitigating mortality risk among Chinese individuals aged 65 and above 18 .Chen et al. 's analysis of PM2.5 pollution demonstrated a positive correlation with mortality rates, coupled with spatial spillover effects 9 .Ying & Li highlighted the prominent role of population aging, industrial emissions, climatic conditions, and per capita GDP in shaping mortality rates 19 .

Research methods for regional mortality patterns
In the study of mortality patterns and the influencing factors, scholars often use statistical regression models and similar methodologies.For instance, Lutz & Kebede employed multivariate statistical analysis to explore the impacts of education and income on global mortality patterns 12 .The Cox proportional hazards model is a common choice, as seen in the work of Luo & Xie, who used it to study the mechanisms through which socioeconomic factors influence mortality rates in China 15 .Fan also utilized the Cox proportional hazards regression model to compute hazard ratios and conducted subgroup analyses based on various factors 18 .Other techniques, such as multiple regression analysis 9 , global spatial autocorrelation models, local spatial autocorrelation models, and spatial regression analysis methods 17 , are employed to investigate various aspects of regional mortality patterns.
In summary, current research on regional mortality disparities in China faces several limitations.First, prior studies often attribute differences in mortality rates to a limited set of socioeconomic factors, and a comprehensive analysis of the overarching socioeconomic forces is lacking.Second, there is a predominant focus on socioeconomic impacts within specific age groups, with insufficient analyses covering the entire mortality pattern.Third, most studies use classical statistical regression analyses, while this paper adopts more recent functional regression models, a method still underutilized in population studies.

Research method and variable selection
In this study, a mortality pattern was employed as a measure of regional population mortality levels.The mortality pattern represents the distribution of age-specific mortality probabilities and can be considered a function of population age.We utilized the function-on-scalar functional regression model for our analysis, drawing www.nature.com/scientificreports/inspiration from the foundational principles outlined in the works of J.O. Ramsay and B.W. Silverman 20,21 and relevant theoretical literature [22][23][24][25][26][27][28][29] .The basic principles of the functional regression model are outlined as follows.

Representation of functional data
The subject of functional data analysis deals with a set of smooth curves, denoted as y 0 n (t), t ∈ [T 0 , T 1 ], 1 ≤ n ≤ N , comprising N such curves.For a given sample curve, we can only observe a finite number of points, i.e., {y 0 n,j ∈ R, 1 ≤ n ≤ N, 1 ≤ j ≤ J n } , with each curve having J n observed points.Functional data possess infinite dimensional characteristics, and for the sake of data modeling and statistical inference, dimension reduction techniques are often applied.One common approach is to expand the functional data using basis functions.Given the known discrete points, fitting with basis functions allows the representation of the original functional data as a linear combination of these basis functions.Let {φ k , 1 ≤ k ≤ K} be the basis functions, then the fitted function can be expressed as follows: where c n,k represents the coefficient for each basis function φ k (t) , and the original data can be expressed as follows: where ε n,j represents the error terms, which are independently and identically distributed with a mean of zero and a variance of σ 2 .
The basis functions can be specified as periodic Fourier bases, nonperiodic B-spline bases, or wavelet bases.Alternatively, data-driven functional principal component bases can be selected, but this necessitates performing principal component analysis (PCA) on the functional data beforehand.
If the original data are affected by a significant level of noise, the expansion of the basis functions becomes increasingly distorted as the number of basis functions increases.Having a relatively small number of basis functions, on the other hand, will also limit the shape of their linear combinations.Therefore, we employ a roughness penalty method to smooth the functional data: At this point, we need to find the coefficients c n,k to minimize the penalty square sum, where ∈ R + is the smoothing parameter, and L is a linear differential operator.To strike a balance between overfitting and oversmoothing by adjusting , we utilized the generalized cross-validation (GCV) method.

Functional regression model
Functional regression analysis encompasses various models, such as scalar-to-function regression, function-toscalar regression, function-to-function regression, mixed-variable regression, nonlinear regression of scalars to functions, and functional generalized linear models.Given the specific problem under investigation, this paper employs a function-to-scalar regression analysis model, expressed as follows: The function variable y n (t), 1 ≤ n ≤ N , observed N times, is an N-dimensional vector.It needs to be gener- ated using the observed values {y 0 n,j ∈ R, 1 ≤ n ≤ N, 1 ≤ j ≤ J n } from the initial data.x ni , 1 ≤ n ≤ N, 1 ≤ i ≤ q is a set of q covariate scalars.β i (t), 1 ≤ i ≤ q represents q functional regression parameters, also known as effect functions, which can be expressed using the basis function Assuming that the observation points are {t j : 1 ≤ j ≤ J} , the model can be represented in matrix form as: Here, As a result, the penalized least squares used for the regression can be defined as: Similarly, we need to find the coefficients B to minimize the sum of squares with the penalty mentioned above.In this paper, we assume that a single smoothing parameter i , 1 ≤ i ≤ q imparts a uniform level of smooth- ness to each component, and we employ the smoothing parameter cross-validation (CV) method for selection.
The advantage of using functional data regression analysis lies in its consideration of the holistic nature of the functional dependent variable.Our study focuses on the overall mortality pattern, representing a complete function curve.In comparison to traditional multivariate regression methods, functional regression analysis (1)

Functional hypothesis testing
In the empirical section, two hypothesis tests were employed: the functional F test and the pointwise t test.The pointwise t test is analogous to the traditional t test, with the initial step involving discretization of functional data as needed, followed by steps identical to those in the traditional t test.Further details are not provided here.
The following section introduces the functional F test method derived from functional regression analysis 27,30 .Without loss of generality, we assume the original model to be Write the alternative hypothesis (Model 1) as follows: Write the null hypothesis (Model 0i) as: The functional F test statistic is defined as follows: Here, ss = N n=1 (y n (t) − y n (t)) 2 dt , y n (t) represents the initial functional curve, and y n (t) represents the regression-predicted functional curve.
W e r e j e c t t h e n u l l h y p o t h e s i s i f , and E is the empirical covariance matrix of the error process obtained from the alternative hypothesis.

Data processing and variable selection
The data for this study were derived from the 2020 China Population Census Yearbook published by the National Bureau of Statistics of China, the 2021 China Health Statistics Yearbook, and annual provincial-level data from the National Data Website.These data are given in Online Resource 1.The dataset covers all 31 provinces in mainland China.The dependent variable data consist of mortality probabilities by age, gender, and the urbanrural classification for each of the 31 provincial administrative regions.The independent variables encompass gender, the urban-rural classification, and various socioeconomic variables.Variable selection was carried out through the following steps: First, we utilized provincial mortality rate data from the Seventh National Population Census as the basis for estimating mortality probabilities for 22 age groups.Mortality probabilities were calculated for both gender and urban-rural subgroups, serving as measures of mortality levels for the 31 provinces.
The original dependent variable data were sex and the urban/rural age-specific mortality probabilities for 31 provincial districts.The age-specific mortality probabilities are the probability that the number of people who have reached a certain age will die before reaching another specific age, usually calculated from the known mortality rate 31 .The age-specific mortality rate data were obtained from the 7th Chinese census, and the formula was where m x represents the mortality rate at the age of x , D x represents the number of deaths in the age group of x , and P x represents the population in the age group of x.
If the age group distance is n , the age-specific mortality probabilities can be expressed as When x = 0: In the last age group, the age-specific mortality probabilities are equal to 1.   www.nature.com/scientificreports/individuals aged 100 years and above reaches a limiting value of 1 and the data for the 95-99 years age group exhibit substantial variations across provinces, these two age groups were excluded from our analysis.Second, the impact of socioeconomic factors on population mortality levels is multifaceted.Previous research has established a strong correlation between the level of socioeconomic development, quality of life, medical care, and technological advancements in a region.Additionally, higher levels of education have been shown to effectively reduce mortality probabilities.Drawing from prior studies on the socioeconomic factors influencing mortality patterns and considering the comprehensiveness and availability of data, we selected 22 indicators.These indicators were subjected to functional stepwise regression analysis, and variables with poor regression performance and nonsignificant regression coefficients were removed.Based on previous experience, we initially selected seven indicators, including Per Capita Gross Domestic Product (PGDP), the urbanization rate, education, gender, and the urban-rural status, to construct the foundational functional regression model.The model's coefficient of determination (R 2 ) was relatively low.To enhance the model's performance, we gradually introduced different factor indicators and conducted variable selection based on p values from pointwise t tests.

Based on the
During the stepwise introduction and screening of variables, we used pointwise t test p values as the criterion, considering the initial twenty age segments as data points for testing the regression coefficients of each factor.If all values in the p value vector were greater than 0.05, we removed that factor and continued to introduce new factors, repeating the process.Simultaneously, we removed certain independent variables, examined changes in the overall model's pointwise t test significance, and gradually selected statistically significant variables affecting the target variable, resulting in a relatively concise and effective model.
In the final model, although the variables health technicians and insured employees did not pass the pointwise significance test at the 0.05 level, they were important medical human resource indicators and social security variables, and therefore, were retained.Additionally, considering the overall p values of all independent variables, their inclusion strengthened the pointwise significance of all variables.The model also included two resourcerelated medical indicators (hospital beds) and social security variables (insured residents) corresponding to them.Possible collinearity issues may have resulted in lower significance for the former two variables.The retention of these variables also facilitated a comparative analysis of their marginal effects with the corresponding variables.
Ultimately, we selected the gender indicator and thirteen socioeconomic indicators that influence regional mortality patterns, as shown in

Empirical analysis
Using the outlined methods and weighted least squares fits, mortality pattern curves for China's provincial regions in 2020 were derived (Fig. 1) and categorized by gender and urban-rural classification.Key observations from Fig. 1 include the following: (1) Mortality patterns exhibit a "J"-shaped distribution, with higher probabilities in infants and a gradual increase after age 40, increasing significantly after 70.(2) Females in both urban and rural populations show lower mortality probabilities across age groups than males.
(3) Urban populations generally have lower mortality probabilities than do their rural counterparts of the same gender.( 4 These findings not only lay the groundwork for identifying the factors influencing mortality patterns regionally but also offer important insights into the mechanisms through which these driving factors affect mortality probabilities across different age groups.

Overall regression results of the model
We established the functional regression model as follows: Here, y i (t) represents the smooth curves of mortality probabilities for the four mortality patterns across the 31 provincial regions, totaling 124 curves.β 0 (t) is the functional constant term, ε i (t) is the functional random error term, and β 1 (t) ∼ β 14 (t) are the functional regression coefficients for various variables.The meanings of the other symbols are detailed in Table 2. Figure 2 shows the regression coefficient curves for 14 explanatory variables.
Examining Fig. 2's 14 regression coefficient curves reveals that the influence of covariates on regional mortality patterns intensifies notably after age 45.Both demographic and socioeconomic factors show a significant increase in their impact on mortality probabilities, as evidenced by the substantial increase in the absolute values of the regression coefficients.Among the 14 coefficients, gender and urban-rural status had the greatest impact on overall mortality probabilities.The gender coefficient consistently holds a positive value, indicating higher male mortality probabilities across different age groups, all else being equal.Conversely, the urban-rural coefficient remains consistently negative, signifying lower mortality probabilities in urban areas than in rural areas across all age groups, attributable to various factors.
Figure 3 shows the goodness of fit of the functional regression model by age group.The model's goodness of fit, which surpassed 60%, notably exceeded 80% for the 10-14 years, 15-19 years, and 65-69 years age groups.In the nine other age groups (5-9 years, 20-24 years, 25-29 years, 30-34 years, 35-39 years, 40-44 years, 60-64 years, 70-74 years, and 75-79 years), the goodness of fit exceeded 70%.This indicates that the 14 selected demographic and socioeconomic indicators effectively explain the mortality probabilities for these age groups.However, the model's fit decreased notably for the older age groups, especially those older than 85 years.This implies the necessity of additional factors to elucidate mortality probabilities for the last two age groups.The lower coefficient of determination R 2 for the last two age groups, particularly for the 90-94 years age group, suggests the influence of unconsidered factors, such as biological and environmental variables, which will be incorporated in future research.
In Table 3, we use the functional F test to test the significance of each coefficient.The variables gender and urban-rural attributes are significant at the 0.1% level, which is consistent with the analysis above.The remaining ( 14)

Analysis of the age-specific effects of the variables gender, urban-rural, hospital beds, and fatality rate
Table 4 shows the regression coefficient values and significance levels of the variables gender, urban-rural, hospital beds and fatality rate.The regression coefficients for the gender variable are significant for all age groups except the 1-4-year-old age group.For neonates (0 years old), the gender variable is significant at the 0.1 level.For the 90-94 age group, the significance level for the gender variable is 0.05.For all the other age groups, the p values of the t tests for the gender variable are less than 0.01.This finding aligns with previous research confirming that male mortality probabilities are generally greater than female mortality probabilities.Our study further revealed that in each www.nature.com/scientificreports/age group, under other equal conditions, the male mortality probabilities were consistently greater than the female mortality probabilities.This indicates that gender-specific mortality patterns in China have transitioned into a "modern pattern" 33 .
The urban-rural variable has a negative influence on mortality probabilities across all age groups, and its impact is significant at the 0.001 level for all 20 age groups.This implies that, under other equal conditions, urban mortality probabilities are generally lower than rural mortality probabilities.Unlike urban areas, rural regions face challenges with limited medical facilities, scarce healthcare resources, and poor health conditions, which can impact medical accessibility.Rural areas are more vulnerable to infectious diseases due to poor infrastructure.Urban areas, with better education and health awareness, exhibit improved lifestyles and overall enhanced health.Economic advantages in urban regions enable access to advanced medical care.The hospital beds variable significantly positively impacts the mortality probability for individuals aged 20-29, 40-69, and 85-94 years.This effect may be linked to variations in healthcare facility quality.Simply having more medical beds does not guarantee lower mortality if healthcare quality is inadequate.For the fatality rate, a significant impact is noted in the 85-94 age group, and it is important to highlight that this effect is negative.

Analysis of the effects of other indicators by age group
In Table 5, further regression analysis indicates that the variables health technicians and insured employees lack significance across all age groups, leading to their exclusion from further examination.Consequently, our focus narrows to 12 variables-4 global and 8 local.We will proceed to assess the impact of these eight remaining  www.nature.com/scientificreports/factors in age-specific groups, ranking them based on the magnitude of influence (sorted by the absolute values of coefficients).The results will be presented in three stages: low age group (0-29 years), mid age group (30-64 years), and high age group (65-94 years).We will discuss the noteworthy socioeconomic factors influencing each of these stages.
Table 6.Absolute values, significance, and ranking of the regression coefficients for the low age group.In each cell, the first row represents the indicator name, the second row denotes the absolute coefficient values, and the third row indicates the p value along with the significance symbols.Tables 7 and 8 share the same format.One asterisk (*) indicates significance at the 5% level, i.e., p < 0.05; two asterisks (**) indicate significance at the 1% level, i.e., p < 0.01; three asterisks (***) indicate significance at the 0.1% level, i.e., p < 0.001.6 and Fig. 4 present the marginal effects and ranking of various factors in the mortality patterns for the low age group.From Table 6 and Fig. 4, it is clear from the results that the urban-rural factor ranks first in terms of its impact on the seven population groups within the low age range.After urban-rural, the variables education and gender follow, with gender having a more significant impact on newborns (0 years) and young children (5-9 years), whereas gender has no significant influence on the 1-to 4-year-old age group.However, the influence of gender begins to increase in the subsequent 10-29-year age group, ranking second in terms of its effect on mortality probabilities.Newborns are affected by seven significant factors, while the 10-14-year-old and 15-19-year-old groups are influenced by only four factors.
Education significantly influences the 0-29 years age group, with p values mostly below 0.01 (except for the 15-19 age group).Higher education has a notable negative impact on mortality, particularly in the 0-9 age range, ranking second in influencing infant and toddler mortality.Overall, women with higher education levels exhibit increased self-care and prevention behaviors, contributing to reduced infant mortality probabilities 0 .In the 10-29 age group, education ranks fourth in terms of the degree of impact.Individuals with higher education levels tend to adopt healthier lifestyles and possess more health awareness.
Income reflects people's living standards and purchasing power.The mortality probability for the age group of 5-29 years is significantly influenced by income, ranking it as the third most influential factor.The data show that income has the most significant impact on the 15-to 19-year-old population, reducing mortality probability by 5.65E-4.
The proportion of urban and rural residents covered by basic medical insurance has a significant impact on the 0-4-year-old age group, with the most pronounced effect occurring for the 1-4-year-old age group, where the influence value peaks at -4.89E-4.
The urbanization rate variable significantly affects the mortality probabilities of infants and young children aged 0-4 years, ranking as the sixth most influential factor.For each percentage point increase in the urbanization rate, the mortality probability for newborns decreases by 5.87E-4, and for 1-4-year-olds, the mortality probability decreases by 2.16E-4.Urban areas have superior medical facilities, more healthcare institutions, and more advanced medical services, ensuring that residents access higher-quality medical resources 5 .
The variable per capita GDP significantly affects infants and young children aged 0-9 years, consistently ranking fourth.However, the findings of this study reveal a positive influence of per capita GDP on infant mortality probabilities, which warrants further discussion.

Analysis of the impact of socioeconomic factors on mortality patterns-mid age group (30-64 years)
Table 7 and Fig. 5 present the marginal effects and ranking of various factors on the mortality patterns for the mid age group.
Among the socioeconomic factors significantly affecting the mortality probabilities of mid age group, the factor gender gradually surpasses the urban-rural factor to become the primary factor.In addition, the most pronounced effect is observed for the education indicator, which ranks third in terms of its influence on the 30-49-year-old population and then falls to fifth place among the 50-59-year-old population, with no significant impact on the 60-64-year-old group.Educational attainment is often associated with occupation and economic www.nature.com/scientificreports/status.Higher educational levels are typically accompanied by better employment opportunities and higher income levels, thus reducing health risks.The urbanization rate significantly impacts the mortality probabilities of mid age group, particularly individuals aged 45-54, ranking fifth in terms of its influence.The effect is positive, implying that urban living might be associated with greater social and economic pressures, which can negatively impact the health of middle-aged individuals.Similarly, the proportion of the tertiary sector significantly affects the mortality probability of the 50-to 54-year-old group.With increasing age, traffic accidents have a progressively greater impact on the mortality probability, with coefficient values showing an upward trend from 9.26E-5 to 1.64E-3 and significant effects among the 50-to 64-year-old population.

Analysis of the impact of socioeconomic factors on mortality patterns-high age group (65-94 years)
Table 8 and Fig. 6 present the marginal effects and ranking of various factors in the mortality patterns for the high age group.
In the elderly population, the main influencing factors are the four global variables analyzed earlier.In the 65-84-year-old group, the primary influencing factor is the gender, followed by the urban-rural factor.However, among the 85-94-year-old population, the influence of gender is lower than that of urban-rural.www.nature.com/scientificreports/ The urbanization rate has a more significant impact on the mortality probability of individuals aged 70-74 years.Education has no significant effect on the mortality probability of the high age group compared with the low and mid age groups.Some research has suggested that education might not be a strong indicator of the socioeconomic status of China's elderly population 15 .
For the elderly population, the chosen variables inadequately explain the mortality probabilities, suggesting that additional factors play a crucial role in this age group.Age-related factors are prominent as the body undergoes natural aging, increasing the likelihood of mortality.Socioeconomic factors are not the primary determinants for this elderly segment.
In the high age group, some factors exhibit large coefficient values but remain nonsignificant.For example, both insured residents and insured employees have substantial negative effects on the mortality probability, and their impact becomes more pronounced with increasing age.Furthermore, income and consumption also have a certain effect on reducing the mortality probability of the elderly, but these factors were not significant.

Conclusion
Leveraging data from China's 7th Population Census, our research delved into the socioeconomic impact factors and mechanisms of mortality patterns, accounting for interprovincial disparities, urban-rural divides, and gender differences.This study also scrutinized their marginal effects and variations across age groups.The key findings include: (1) Regional mortality patterns in China show significant gender and urban-rural disparities across most age groups.In the same age group, the mortality probabilities are generally greater for males than for females, and they generally are greater in rural areas than in urban areas.The factor urban-rural dominates the mortality probabilities of individuals aged 39 and younger, while the factor gender becomes most significant in the 40-84 age group.(2) Socioeconomic factors, apart from gender and urban-rural distinctions, on mortality patterns typically become evident after the age of 45, with less noticeable variations in their impact on early-life mortality patterns.

Figure 3 .
Figure 3. Goodness-of-fit values for each age group.

Figure 5 .
Figure 5. Marginal impact ranking of factors in the mid age group.

Figure 6 .
Figure 6.Marginal impact ranking of factors in the high age group.
2dt avoids separate regressions for different age groups and instead comprehensively considers the overall variation.Multivariate regression methods often perform local fitting, providing accurate results in specific age ranges but lacking a comprehensive study of the overall mortality pattern and the identification of factors influencing the entire pattern.In supplementary information II, we present a comparative analysis between the results of multivariate regression methods and our approach, demonstrating that functional data regression, in contrast to traditional methods, more comprehensively reveals the characteristics of the mortality pattern.This global analysis contributes to a deeper understanding of the overall patterns in population mortality.

Table 1 .
Table 1, and categorized them into three groups: demographic attributes, economic attributes, and social attributes.Demographic attribute indicators include age-specific mortality probabilities and gender variables.Economic attributes include PGDP and tertiary sector.Social attributes encompass a variety of indicators.The urban-rural indicator is a binary control variable with a value of 0 or 1.The urbanization rate serves as an indicator of the urban-rural development level, while income and consumption represent indicators of residents' living standards.Hospital beds are indicators of physical resource capacity in healthcare development, and health technicians serve as indicators of human resource capacity in healthcare development.The fatality rate is also used as an indicator of the level of healthcare development.Healthcare security level indicators include insured residents and insured employees.These two insurance indicators were calculated based on the number of people covered Attributes and definitions.Source:China Population Census Yearbook, China Health Statistics Yearbook and National Data Website, 2020-2021.bybasic medical insurance and the total number of permanent residents, as reported in the National Statistical Yearbook.Education served as an indicator of educational attainment.Traffic accidents are used as indicators of public safety levels.To standardize the different scales of the data for comparison, we normalized the covariate indicators using Z score transformation based on the mean and standard deviation of the original indicator data.
Insured EmployeesProportion of employees covered by basic medical insurance (%).An indicator reflecting the coverage of social security for working populations Education Average years of education for the population aged 15 and above (Years).A crucial indicator for measuring the level of education and human capital development Traffic Accidents Total number of traffic accidents.An indicator measuring public safety levels and the risk of population mortality Vol:.(1234567890)Scientific Reports | (2024) 14:10614 | https://doi.org/10.1038/s41598-024-61262-5www.nature.com/scientificreports/ ) In urban areas, female mortality increases at approximately age 50, while mortality increases for urban males and rural females at approximately age 45, and rural males show a substantial increase in mortality at approximately age 40.(5) Substantial provincial variation exists, with divergence in mortality patterns for rural males starting at approximately age 35, those for urban males and females starting at approximately age 50, and rural females exhibiting increasing divergence at approximately age 45.Notably, considerable variations exist in mortality probabilities for the elderly population (those over age 60) among provinces.
two significant functions are hospital beds and fatality rate.In this way, we selected four indicators as global influencing variables.The purpose of testing the functional regression coefficients is to determine whether each regression coefficient function is significantly different from zero.The mortality probabilities are close to zero in the initial age groups (1-39 years), leading to very large and nonsignificant p values for the regression coefficients, as these coefficients were already close to zero.However, the lack of significance of the overall functional regression coefficients does not imply that they are not significant in every age group.Subsequently, we conducted pointwise t tests to observe the significance of each functional coefficient in different age groups.

Table 2 .
Descriptive statistics of attributes.Urban income and rural income are collectively referred to as 'income' in the analysis, and urban consumption and rural consumption are collectively referred to as ' consumption' .

Table 4 .
Coefficients and significance of the variables gender, urban-rural, hospital beds, and fatality rate.The number represents the coefficient, and the symbol in the upper right corner represents the significance.Table5shares the same format.One asterisk ( *) indicates significance at the 5% level, i.e., p < 0.05; two asterisks (**) indicate significance at the 1% level, i.e., p < 0.01; three asterisks (***) indicate significance at the 0.1% level, i.e., p < 0.001.

Table 5 .
Coefficient values and significance of t Tests for regression coefficients of other indicators in various age groups (two-tailed).One asterisk (*) indicates significance at the 5% level, i.e., p < 0.05; two asterisks (**) indicate significance at the 1% level, i.e., p < 0.01; three asterisks (***) indicate significance at the 0.1% level, i.e., p < 0.001.Vol:.(1234567890)Scientific Reports | (2024) 14:10614 | https://doi.org/10.1038/s41598-024-61262-5 Analysis of the impact of socioeconomic factors on mortality patterns-low age group (0-29 years)Table * Figure 4. Marginal impact ranking of factors in the low age group.Note: The size and color intensity of the circles represent the magnitude of the absolute values, where larger circles with darker colors indicate larger absolute values.(The same applies throughout).